Finding crime patterns in Montgomery County

Group components

Cephas Barreto- cephasax [at] gmail [dot] com>
Marco Olimpio - marco.olimpio [at] gmail [dot] com
Rebecca Betwel - bekbetwel [at] gmail [dot] com

About the dataset

The data presented is derived from reported crimes classified according to Maryland criminal code and documented by approved police incident reports. The data about crimes do not put info about the victins and its masks the actual address not putting the exact place where the complaint occured.

Source: https://data.world/jboutros/montgomery-county-crime

Maryland County Area

Checking about data available

Incident ID: Looks like a simple table identification number
CR Number: CR stands for Complaint Register and its a identification for a Compleint Process for a full disciplinary investigation
Dispatch Date/Time: Looks the date and time when the complaint was made
Class

Class number:Identification number of the complaint
Class description:Description of the class

Complaint

Public place

Police District Name: Auto describes it
Police District Number: Auto describes it
Block address: Auto describes it
City: Auto describes it
State: Auto describes it
Zip Code: Auto descrives it
Agency: Agency responsable for this address
Place: Kind of place where the crime occured
Sector

Rockville District: Sectors A, B, C
Bethesda District: Sectors D, E
Silver Spring District: Sector G
Wheaton-Glenmont District: Sector J, K
Germantown District: Sectors M, N, P

Address Number: Auto describes it
Beat: Beat is the territory and time that a police officer patrols
PRA: Police Reporting Area
Latitude: Auto describes it
Longitude: Auto describes it
Location: Tuple of Latitude and Longitude

Complaint estimative

Start Data/Time: Start of the complaint
End Date/Time: End of the complaint

Dataset questions

About Type of complaint

Which complaint is most common?
What are the categories of complaints?
Could we categorize the types of crimes in violent or not?

About Period of time/day of the week

Wich period of the day that most complaints occur
Wich day of the week that most complaints occur
Wich month of the years that most complaints occur
These complainsts are realted with holidays?
What period of time (time of day/day of the week/month of the year) has correlation with the type of complaint

About Location

Where is most of the complaints?
What sort of places have most complaints
What sort of place has correlation with the type of complaint

Correlation between locale and type of complaint

Is there a correlations between the day of the week and kind of complaint?

References

https://www.montgomerycountymd.gov/pol/districts/whatsmydistrict.html
http://www.ericcarlson.net/scanner/police.html

The begining

Fist of all we need to import libraries needed. NumPy, Pandas, Matplotlib and Bokeh are needed to run the scripts of this notebook. Below we can see how to import these libraries and how to configure Bokeh to show chart inline (calling output_notebook() function)



In [65]:

    
import pandas as pd
import numpy  as np
import matplotlib.pyplot as plt
import seaborn

#Importing Bokeh library, updated to version 0.12.9
from bokeh.io import push_notebook, show, output_notebook, output_file
from bokeh.layouts import row
from bokeh.plotting import figure
from bokeh.sampledata.commits import data

from bokeh.models import (
    GMapPlot, GMapOptions, ColumnDataSource, Circle, DataRange1d, PanTool, WheelZoomTool, BoxSelectTool, Jitter
)
from bokeh.core.properties import field
#from bokeh.transform import jitter
#from bokeh.models.transforms import Jitter

#Inline bokeh charts
output_notebook()









    





    
        
        Loading BokehJS ...

Loading datasets



In [89]:

    
#Loading dataset from Montgomery County complaint dataset
monty_data = pd.read_csv("MontgomeryCountyCrime2013.csv")
monty_data.head()









    Out[89]:






  
    
      
      Incident ID
      CR Number
      Dispatch Date / Time
      Class
      Class Description
      Police District Name
      Block Address
      City
      State
      Zip Code
      ...
      Sector
      Beat
      PRA
      Start Date / Time
      End Date / Time
      Latitude
      Longitude
      Police District Number
      Location
      Address Number
    
  
  
    
      0
      200939101
      13047006
      10/02/2013 07:52:41 PM
      511
      BURG FORCE-RES/NIGHT
      OTHER
      25700  MT RADNOR DR
      DAMASCUS
      MD
      20872.0
      ...
      NaN
      NaN
      NaN
      10/02/2013 07:52:00 PM
      NaN
      NaN
      NaN
      OTHER
      NaN
      25700.0
    
    
      1
      200952042
      13062965
      12/31/2013 09:46:58 PM
      1834
      CDS-POSS MARIJUANA/HASHISH
      GERMANTOWN
      GUNNERS BRANCH  RD
      GERMANTOWN
      MD
      20874.0
      ...
      M
      5M1
      470.0
      12/31/2013 09:46:00 PM
      NaN
      NaN
      NaN
      5D
      NaN
      NaN
    
    
      2
      200926636
      13031483
      07/06/2013 09:06:24 AM
      1412
      VANDALISM-MOTOR VEHICLE
      MONTGOMERY VILLAGE
      OLDE TOWNE  AVE
      GAITHERSBURG
      MD
      20877.0
      ...
      P
      6P3
      431.0
      07/06/2013 09:06:00 AM
      NaN
      NaN
      NaN
      6D
      NaN
      NaN
    
    
      3
      200929538
      13035288
      07/28/2013 09:13:15 PM
      2752
      FUGITIVE FROM JUSTICE(OUT OF STATE)
      BETHESDA
      BEACH  DR
      CHEVY CHASE
      MD
      20815.0
      ...
      D
      2D1
      11.0
      07/28/2013 09:13:00 PM
      NaN
      NaN
      NaN
      2D
      NaN
      NaN
    
    
      4
      200930689
      13036876
      08/06/2013 05:16:17 PM
      2812
      DRIVING UNDER THE INFLUENCE
      BETHESDA
      BEACH  DR
      SILVER SPRING
      MD
      20815.0
      ...
      D
      2D3
      178.0
      08/06/2013 05:16:00 PM
      NaN
      NaN
      NaN
      2D
      NaN
      NaN
    
  

5 rows × 22 columns

As we described we have all theses coluns in this dataset.



In [90]:

    
monty_data.columns









    Out[90]:





Index(['Incident ID', 'CR Number', 'Dispatch Date / Time', 'Class',
       'Class Description', 'Police District Name', 'Block Address', 'City',
       'State', 'Zip Code', 'Agency', 'Place', 'Sector', 'Beat', 'PRA',
       'Start Date / Time', 'End Date / Time', 'Latitude', 'Longitude',
       'Police District Number', 'Location', 'Address Number'],
      dtype='object')

Analysing the dataset loaded we can check the we have complaints 23369



In [91]:

    
number_of_registries = monty_data.shape
print(number_of_registries[0])

Which sort of complaints are most common and Pareto Analisys?



In [92]:

    
#Using the agg function allows you to calculate the frequency for each group using the standard library function len.
#Sorting the result by the aggregated column code_count values, in descending order, then head selecting the top n records, then reseting the frame; will produce the top n frequent records
top = monty_data.groupby(['Class','Class Description'])['Class'].agg({"frequency": len}).sort_values("frequency", ascending=False).head(43).reset_index()
top['frequency'] = (top['frequency']/number_of_registries[0])*100
top









    Out[92]:






  
    
      
      Class
      Class Description
      frequency
    
  
  
    
      0
      2812
      DRIVING UNDER THE INFLUENCE
      7.317386
    
    
      1
      1834
      CDS-POSS MARIJUANA/HASHISH
      5.708417
    
    
      2
      2938
      POL INFORMATION
      5.096495
    
    
      3
      614
      LARCENY FROM AUTO OVER $200
      3.911164
    
    
      4
      617
      LARCENY FROM BUILDING OVER $200
      3.829860
    
    
      5
      2942
      MENTAL TRANSPORT
      3.598785
    
    
      6
      1412
      VANDALISM-MOTOR VEHICLE
      3.260730
    
    
      7
      2941
      LOST PROPERTY
      3.119517
    
    
      8
      619
      LARCENY OTHER OVER $200
      2.790021
    
    
      9
      634
      LARCENY FROM AUTO UNDER $50
      2.610296
    
    
      10
      1013
      FORGERY/CNTRFT - IDENTITY THEFT
      2.550387
    
    
      11
      613
      LARCENY SHOPLIFTING OVER $200
      2.263683
    
    
      12
      623
      LARCENY SHOPLIFTING $50 - $199
      2.092516
    
    
      13
      2216
      LIQUOR - DRINK IN PUB OVER 21
      2.062562
    
    
      14
      2413
      DISORDERLY CONDUCT
      1.985536
    
    
      15
      811
      ASSAULT & BATTERY - CITIZEN
      1.634644
    
    
      16
      624
      LARCENY FROM AUTO $50 - $199
      1.523386
    
    
      17
      1011
      FORGERY/CNTRFT-CRDT CARDS
      1.463477
    
    
      18
      2943
      MISSING PERSON
      1.459198
    
    
      19
      821
      SIMPLE ASSAULT - CITIZEN
      1.330823
    
    
      20
      1864
      CDS IMPLMNT-MARIJUANA/HASHISH
      1.270914
    
    
      21
      711
      AUTO THEFT - PASSENGER VEHICLE
      1.253798
    
    
      22
      813
      ASSAULT & BATTERY SPOUSE/PARTNER
      1.240960
    
    
      23
      635
      LARCENY AUTO PART UNDER $50
      1.142539
    
    
      24
      2111
      JUVENILE RUNAWAY
      1.104027
    
    
      25
      633
      LARCENY SHOPLIFTING UNDER $50
      1.104027
    
    
      26
      512
      BURG FORCE-RES/DAY
      1.027002
    
    
      27
      2737
      TRESPASSING
      0.954256
    
    
      28
      2913
      SUDDEN DEATH NATURAL
      0.954256
    
    
      29
      627
      LARCENY FROM BUILDING $50-$199
      0.945697
    
    
      30
      1411
      VANDALISM-DWELLING
      0.907185
    
    
      31
      639
      LARCENY OTHER UNDER $50
      0.894347
    
    
      32
      629
      LARCENY OTHER $50 - $199
      0.864393
    
    
      33
      1014
      FORGERY/CNTRFT-ALL OTHER
      0.842997
    
    
      34
      513
      BURG FORCE-RES/TIME UNK
      0.787368
    
    
      35
      616
      LARCENY BICYCLE OVER $200
      0.761693
    
    
      36
      1417
      VANDALISM-OTHER
      0.736018
    
    
      37
      637
      LARCENY FROM BLDG UNDER $50
      0.706064
    
    
      38
      1824
      CDS-SELL-MARIJUANA/HASHISH
      0.671830
    
    
      39
      514
      BURG FORCE-COMM/NIGHT
      0.637597
    
    
      40
      2946
      RECOVERED PROPERTY/MONT. CO.
      0.629038
    
    
      41
      1012
      FORGERY/CNTRFT-CHECKS
      0.624759
    
    
      42
      511
      BURG FORCE-RES/NIGHT
      0.616201

As we can see the most common type of complaint is "Driving under influence". Unfortunatelly we do not have the type of subtance influencing the driver like alcohool and other drugs. Altought, interestingly the second most common complaint is "Possession of CDS (Controlled Dangerous Substance) marijuana/hashish". But we also can not correlate the substance of with the drives is influenced with the possession of marijuana/hashish.

The number of 43 Classes of crimes was not a magic number. These 43 classes of crimes make 80% of the total of crimes and the 20% are distributed in the other 242 classes.

The main purpose of this kind of approach is to be right to the point, where the overwhelming majority of the complaints are related. The Police Office or County's board of advisors could make decisions to prevent or mitigate these kinds of problems based on the ammount of crimes. This kind of approach will be time saving :] in the methods that cannot be fully automated, like classification of crimes in violent or not or the description of 'master classes'.



In [70]:

    
from decimal import *
#Configure precision
getcontext().prec = 2

parcial_perc = top['frequency'].sum()
parcial_perc = round(parcial_perc,2)
tot_classes = monty_data['Class'].value_counts(normalize=True, sort=True).shape[0]
print("The crimes above are responsible for up to " + str(parcial_perc) + "% (Pareto Analysis) of the total crimes. Performed by "+ str(top.shape[0])+" out of "+ str(tot_classes)+" classes of crimes!")
print("For precision purposes we had only " + str(round((43/285)*100,2)) + "% classes that represent 80% of the total of complaints")









    



The crimes above are responsible for up to 80.29% (Pareto Analysis) of the total crimes. Performed by 43 out of 285 classes of crimes!
For precision purposes we had only 15.09% classes that represent 80% of the total of complaints

For reference: https://en.wikipedia.org/wiki/Pareto_analysis

What are the Classes of Classes (Master Classes) of complaints?

In terms of granularity we cannot extract much of information utilizing the classification system adopted. So we observed that there is a possibility to aglutinate a set of classes in a 'master class'. That could agregate a number of crimes in a single class as we can see below:



In [93]:

    
#Creating a master class to categorize crimes
classaux = monty_data["Class"]/100
classaux = classaux.astype(int)
classaux = classaux*100

#Inserting this new data in the dataset
monty_data["MasterClass"] = classaux
monty_data[["Class","Class Description",'MasterClass']].head(5)









    Out[93]:






  
    
      
      Class
      Class Description
      MasterClass
    
  
  
    
      0
      511
      BURG FORCE-RES/NIGHT
      500
    
    
      1
      1834
      CDS-POSS MARIJUANA/HASHISH
      1800
    
    
      2
      1412
      VANDALISM-MOTOR VEHICLE
      1400
    
    
      3
      2752
      FUGITIVE FROM JUSTICE(OUT OF STATE)
      2700
    
    
      4
      2812
      DRIVING UNDER THE INFLUENCE
      2800

We can observe that line 17 and 19 below are respectively from class 1834 and 1833. Looking at its 'Class Description' both are classified as Controlle Dangerous Substance Possession and could be agregated in a major class 1800.



In [95]:

    
monty_data[17:25].head(3)









    Out[95]:






  
    
      
      Incident ID
      CR Number
      Dispatch Date / Time
      Class
      Class Description
      Police District Name
      Block Address
      City
      State
      Zip Code
      ...
      Beat
      PRA
      Start Date / Time
      End Date / Time
      Latitude
      Longitude
      Police District Number
      Location
      Address Number
      MasterClass
    
  
  
    
      17
      200925988
      13030561
      07/01/2013 03:26:03 AM
      1834
      CDS-POSS MARIJUANA/HASHISH
      SILVER SPRING
      12200  NEW HAMPSHIRE AVE
      SILVER SPRING
      MD
      20904.0
      ...
      3I1
      519.0
      07/01/2013 03:26:00 AM
      NaN
      39.055327
      -76.995580
      3D
      (39.055326908694369, -76.99557965357296)
      12200.0
      1800
    
    
      18
      200925991
      13030553
      07/01/2013 01:03:17 AM
      2812
      DRIVING UNDER THE INFLUENCE
      SILVER SPRING
      8700  PLYMOUTH ST
      SILVER SPRING
      MD
      20901.0
      ...
      3H1
      126.0
      07/01/2013 01:03:00 AM
      NaN
      38.999406
      -77.007736
      3D
      (38.999406266117461, -77.007735832110484)
      8700.0
      2800
    
    
      19
      200925992
      13030559
      07/01/2013 01:49:33 AM
      1833
      CDS-POSS COCAINE& DERIVATIVES
      GERMANTOWN
      12700  MIDDLEBROOK RD
      GERMANTOWN
      MD
      20874.0
      ...
      5N1
      595.0
      07/01/2013 01:49:00 AM
      NaN
      39.176504
      -77.263908
      5D
      (39.176504471375559, -77.263907914323482)
      12700.0
      1800
    
  

3 rows × 23 columns

Bellow we ranked by frequency the lista of crimes and it's 'master class'



In [117]:

    
#Considering the top crimes

#copy
top_classes_top = top

#Creation of a Master Class
top_classes_top['Master Class'] = 0
aux = top_classes_top['Master Class'].astype(float,copy=True)
top_classes_top['Master Class'] = aux
top_classes_top['Master Class'] = top_classes_top['Class']/100
top_classes_top['Master Class'] = top_classes_top['Master Class'].round()
top_classes_top['Master Class'] = top_classes_top['Master Class']*100
aux = top_classes_top['Master Class'].astype(int,copy=True)
top_classes_top['Master Class'] = aux

top_classes_top









    Out[117]:






  
    
      
      Class
      Class Description
      frequency
      Master Class
    
  
  
    
      0
      2812
      DRIVING UNDER THE INFLUENCE
      7.317386
      2800
    
    
      1
      1834
      CDS-POSS MARIJUANA/HASHISH
      5.708417
      1800
    
    
      2
      2938
      POL INFORMATION
      5.096495
      2900
    
    
      3
      614
      LARCENY FROM AUTO OVER $200
      3.911164
      600
    
    
      4
      617
      LARCENY FROM BUILDING OVER $200
      3.829860
      600
    
    
      5
      2942
      MENTAL TRANSPORT
      3.598785
      2900
    
    
      6
      1412
      VANDALISM-MOTOR VEHICLE
      3.260730
      1400
    
    
      7
      2941
      LOST PROPERTY
      3.119517
      2900
    
    
      8
      619
      LARCENY OTHER OVER $200
      2.790021
      600
    
    
      9
      634
      LARCENY FROM AUTO UNDER $50
      2.610296
      600
    
    
      10
      1013
      FORGERY/CNTRFT - IDENTITY THEFT
      2.550387
      1000
    
    
      11
      613
      LARCENY SHOPLIFTING OVER $200
      2.263683
      600
    
    
      12
      623
      LARCENY SHOPLIFTING $50 - $199
      2.092516
      600
    
    
      13
      2216
      LIQUOR - DRINK IN PUB OVER 21
      2.062562
      2200
    
    
      14
      2413
      DISORDERLY CONDUCT
      1.985536
      2400
    
    
      15
      811
      ASSAULT & BATTERY - CITIZEN
      1.634644
      800
    
    
      16
      624
      LARCENY FROM AUTO $50 - $199
      1.523386
      600
    
    
      17
      1011
      FORGERY/CNTRFT-CRDT CARDS
      1.463477
      1000
    
    
      18
      2943
      MISSING PERSON
      1.459198
      2900
    
    
      19
      821
      SIMPLE ASSAULT - CITIZEN
      1.330823
      800
    
    
      20
      1864
      CDS IMPLMNT-MARIJUANA/HASHISH
      1.270914
      1900
    
    
      21
      711
      AUTO THEFT - PASSENGER VEHICLE
      1.253798
      700
    
    
      22
      813
      ASSAULT & BATTERY SPOUSE/PARTNER
      1.240960
      800
    
    
      23
      635
      LARCENY AUTO PART UNDER $50
      1.142539
      600
    
    
      24
      2111
      JUVENILE RUNAWAY
      1.104027
      2100
    
    
      25
      633
      LARCENY SHOPLIFTING UNDER $50
      1.104027
      600
    
    
      26
      512
      BURG FORCE-RES/DAY
      1.027002
      500
    
    
      27
      2737
      TRESPASSING
      0.954256
      2700
    
    
      28
      2913
      SUDDEN DEATH NATURAL
      0.954256
      2900
    
    
      29
      627
      LARCENY FROM BUILDING $50-$199
      0.945697
      600
    
    
      30
      1411
      VANDALISM-DWELLING
      0.907185
      1400
    
    
      31
      639
      LARCENY OTHER UNDER $50
      0.894347
      600
    
    
      32
      629
      LARCENY OTHER $50 - $199
      0.864393
      600
    
    
      33
      1014
      FORGERY/CNTRFT-ALL OTHER
      0.842997
      1000
    
    
      34
      513
      BURG FORCE-RES/TIME UNK
      0.787368
      500
    
    
      35
      616
      LARCENY BICYCLE OVER $200
      0.761693
      600
    
    
      36
      1417
      VANDALISM-OTHER
      0.736018
      1400
    
    
      37
      637
      LARCENY FROM BLDG UNDER $50
      0.706064
      600
    
    
      38
      1824
      CDS-SELL-MARIJUANA/HASHISH
      0.671830
      1800
    
    
      39
      514
      BURG FORCE-COMM/NIGHT
      0.637597
      500
    
    
      40
      2946
      RECOVERED PROPERTY/MONT. CO.
      0.629038
      2900
    
    
      41
      1012
      FORGERY/CNTRFT-CHECKS
      0.624759
      1000
    
    
      42
      511
      BURG FORCE-RES/NIGHT
      0.616201
      500



In [126]:

    
top_classes_top[['Class Description','frequency']].plot.bar()









    Out[126]:





<matplotlib.axes._subplots.AxesSubplot at 0x1171cff60>

Configuring description of the 'Master Classes'

Analysing the descriptions of the crimes is common to see that the 'class description' are separated with a hyphen sign but not all master classes of crimes could be generalized to adopt the left portion of '-' description. As we can notice below the master class 2900 was classified as 'Misc.' because there is more than one type of crime related and we did not found a better derscription. Its important to notice that we only worked with the top complaints (Pareto). And because of the 20/80 analysis we only have to threat 14 'master classes'.



In [75]:

    
#Inserting the description of the Master Classes
top_classes_top['Master Class Description'] ='' 

top_classes_top[top_classes_top['Master Class'] == 600]
test_top = top_classes_top


test_top.loc[(test_top['Master Class'] ==  600),'Master Class Description'] = 'LARCENY'
test_top.loc[(test_top['Master Class'] == 2900),'Master Class Description'] = 'MISC'
test_top.loc[(test_top['Master Class'] == 1400),'Master Class Description'] = 'VANDALISM'
test_top.loc[(test_top['Master Class'] == 1000),'Master Class Description'] = 'FORGERY/CNTRFT'
test_top.loc[(test_top['Master Class'] ==  500),'Master Class Description'] = 'BURGLARY'
test_top.loc[(test_top['Master Class'] ==  800),'Master Class Description'] = 'ASSAULT & BATTERY'
test_top.loc[(test_top['Master Class'] == 1800),'Master Class Description'] = 'CONTROLLED DANGEROUS SUBSTANCE POSSESSION'
test_top.loc[(test_top['Master Class'] ==  700),'Master Class Description'] = 'THEFT'
test_top.loc[(test_top['Master Class'] == 2100),'Master Class Description'] = 'JUVENILE RUNAWAY'
test_top.loc[(test_top['Master Class'] == 2800),'Master Class Description'] = 'DRIVING UNDER THE INFLUENCE'
test_top.loc[(test_top['Master Class'] == 1900),'Master Class Description'] = 'CONTROLLED DANGEROUS SUBSTANCE IMPLMNT'
test_top.loc[(test_top['Master Class'] == 2200),'Master Class Description'] = 'LIQUOR - DRINK IN PUB OVER 21'
test_top.loc[(test_top['Master Class'] == 2400),'Master Class Description'] = 'DISORDERLY CONDUCT'
test_top.loc[(test_top['Master Class'] == 2700),'Master Class Description'] = 'TRESPASSING'

test_top









    Out[75]:






  
    
      
      Class
      Class Description
      frequency
      Master Class
      Master Class Description
    
  
  
    
      0
      2812
      DRIVING UNDER THE INFLUENCE
      7.317386
      2800
      DRIVING UNDER THE INFLUENCE
    
    
      1
      1834
      CDS-POSS MARIJUANA/HASHISH
      5.708417
      1800
      CONTROLLED DANGEROUS SUBSTANCE POSSESSION
    
    
      2
      2938
      POL INFORMATION
      5.096495
      2900
      MISC
    
    
      3
      614
      LARCENY FROM AUTO OVER $200
      3.911164
      600
      LARCENY
    
    
      4
      617
      LARCENY FROM BUILDING OVER $200
      3.829860
      600
      LARCENY
    
    
      5
      2942
      MENTAL TRANSPORT
      3.598785
      2900
      MISC
    
    
      6
      1412
      VANDALISM-MOTOR VEHICLE
      3.260730
      1400
      VANDALISM
    
    
      7
      2941
      LOST PROPERTY
      3.119517
      2900
      MISC
    
    
      8
      619
      LARCENY OTHER OVER $200
      2.790021
      600
      LARCENY
    
    
      9
      634
      LARCENY FROM AUTO UNDER $50
      2.610296
      600
      LARCENY
    
    
      10
      1013
      FORGERY/CNTRFT - IDENTITY THEFT
      2.550387
      1000
      FORGERY/CNTRFT
    
    
      11
      613
      LARCENY SHOPLIFTING OVER $200
      2.263683
      600
      LARCENY
    
    
      12
      623
      LARCENY SHOPLIFTING $50 - $199
      2.092516
      600
      LARCENY
    
    
      13
      2216
      LIQUOR - DRINK IN PUB OVER 21
      2.062562
      2200
      LIQUOR - DRINK IN PUB OVER 21
    
    
      14
      2413
      DISORDERLY CONDUCT
      1.985536
      2400
      DISORDERLY CONDUCT
    
    
      15
      811
      ASSAULT & BATTERY - CITIZEN
      1.634644
      800
      ASSAULT & BATTERY
    
    
      16
      624
      LARCENY FROM AUTO $50 - $199
      1.523386
      600
      LARCENY
    
    
      17
      1011
      FORGERY/CNTRFT-CRDT CARDS
      1.463477
      1000
      FORGERY/CNTRFT
    
    
      18
      2943
      MISSING PERSON
      1.459198
      2900
      MISC
    
    
      19
      821
      SIMPLE ASSAULT - CITIZEN
      1.330823
      800
      ASSAULT & BATTERY
    
    
      20
      1864
      CDS IMPLMNT-MARIJUANA/HASHISH
      1.270914
      1900
      CONTROLLED DANGEROUS SUBSTANCE IMPLMNT
    
    
      21
      711
      AUTO THEFT - PASSENGER VEHICLE
      1.253798
      700
      THEFT
    
    
      22
      813
      ASSAULT & BATTERY SPOUSE/PARTNER
      1.240960
      800
      ASSAULT & BATTERY
    
    
      23
      635
      LARCENY AUTO PART UNDER $50
      1.142539
      600
      LARCENY
    
    
      24
      2111
      JUVENILE RUNAWAY
      1.104027
      2100
      JUVENILE RUNAWAY
    
    
      25
      633
      LARCENY SHOPLIFTING UNDER $50
      1.104027
      600
      LARCENY
    
    
      26
      512
      BURG FORCE-RES/DAY
      1.027002
      500
      BURGLARY
    
    
      27
      2737
      TRESPASSING
      0.954256
      2700
      TRESPASSING
    
    
      28
      2913
      SUDDEN DEATH NATURAL
      0.954256
      2900
      MISC
    
    
      29
      627
      LARCENY FROM BUILDING $50-$199
      0.945697
      600
      LARCENY
    
    
      30
      1411
      VANDALISM-DWELLING
      0.907185
      1400
      VANDALISM
    
    
      31
      639
      LARCENY OTHER UNDER $50
      0.894347
      600
      LARCENY
    
    
      32
      629
      LARCENY OTHER $50 - $199
      0.864393
      600
      LARCENY
    
    
      33
      1014
      FORGERY/CNTRFT-ALL OTHER
      0.842997
      1000
      FORGERY/CNTRFT
    
    
      34
      513
      BURG FORCE-RES/TIME UNK
      0.787368
      500
      BURGLARY
    
    
      35
      616
      LARCENY BICYCLE OVER $200
      0.761693
      600
      LARCENY
    
    
      36
      1417
      VANDALISM-OTHER
      0.736018
      1400
      VANDALISM
    
    
      37
      637
      LARCENY FROM BLDG UNDER $50
      0.706064
      600
      LARCENY
    
    
      38
      1824
      CDS-SELL-MARIJUANA/HASHISH
      0.671830
      1800
      CONTROLLED DANGEROUS SUBSTANCE POSSESSION
    
    
      39
      514
      BURG FORCE-COMM/NIGHT
      0.637597
      500
      BURGLARY
    
    
      40
      2946
      RECOVERED PROPERTY/MONT. CO.
      0.629038
      2900
      MISC
    
    
      41
      1012
      FORGERY/CNTRFT-CHECKS
      0.624759
      1000
      FORGERY/CNTRFT
    
    
      42
      511
      BURG FORCE-RES/NIGHT
      0.616201
      500
      BURGLARY

It's notorious that "Driving under the influency" is the most common crime commited but when we agregate classes we could take another understading of the most common crimes. As we can see below, the master class 600, the code to 'Larceny', is the most common type of crime with 25.44% of the TOP43 complaints.



In [78]:

    
test_top['Master Class'].value_counts(sort=True)









    Out[78]:





600     14
2900     6
1000     4
500      4
1400     3
800      3
1800     2
700      1
2100     1
2800     1
1900     1
2200     1
2400     1
2700     1
Name: Master Class, dtype: int64



In [107]:

    
test_top.groupby(['Master Class']).sum()









    Out[107]:






  
    
      
      Class
      frequency
    
    
      Master Class
      
      
    
  
  
    
      500
      2050
      3.068167
    
    
      600
      8760
      25.439685
    
    
      700
      711
      1.253798
    
    
      800
      2445
      4.206427
    
    
      1000
      4050
      5.481621
    
    
      1400
      4240
      4.903933
    
    
      1800
      3658
      6.380247
    
    
      1900
      1864
      1.270914
    
    
      2100
      2111
      1.104027
    
    
      2200
      2216
      2.062562
    
    
      2400
      2413
      1.985536
    
    
      2700
      2737
      0.954256
    
    
      2800
      2812
      7.317386
    
    
      2900
      17623
      14.857290

Could we categorize the types of crimes in violent or not?

According to wikipedia (https://en.wikipedia.org/wiki/Violent_crime)violent crimes include, but are not limited to, this list of crimes: Typically, violent criminals includes aircraft hijackers, bank robbers, muggers, burglars, terrorists, carjackers, rapists, kidnappers, torturers, active shooters, murderers, gangsters, drug cartels, and others.

Starting from the crimes in the pareto analysis we noticed that only tree master classes could be considered violent, that are:

500 - BURGLARY
700 - THEFT
800 - ASSAULT & BATTERY



In [110]:

    
test_top['Violent crime'] = False

test_top.loc[(test_top['Master Class'] ==  500),'Violent crime'] = True
test_top.loc[(test_top['Master Class'] ==  800),'Violent crime'] = True
test_top.loc[(test_top['Master Class'] ==  700),'Violent crime'] = True

test_top.sort_values(['Violent crime', 'frequency'], ascending=False, axis=0, kind='quicksort')









    Out[110]:






  
    
      
      Class
      Class Description
      frequency
      Master Class
      Master Class Description
      Violent crime
    
  
  
    
      15
      811
      ASSAULT & BATTERY - CITIZEN
      1.634644
      800
      ASSAULT & BATTERY
      True
    
    
      19
      821
      SIMPLE ASSAULT - CITIZEN
      1.330823
      800
      ASSAULT & BATTERY
      True
    
    
      21
      711
      AUTO THEFT - PASSENGER VEHICLE
      1.253798
      700
      THEFT
      True
    
    
      22
      813
      ASSAULT & BATTERY SPOUSE/PARTNER
      1.240960
      800
      ASSAULT & BATTERY
      True
    
    
      26
      512
      BURG FORCE-RES/DAY
      1.027002
      500
      BURGLARY
      True
    
    
      34
      513
      BURG FORCE-RES/TIME UNK
      0.787368
      500
      BURGLARY
      True
    
    
      39
      514
      BURG FORCE-COMM/NIGHT
      0.637597
      500
      BURGLARY
      True
    
    
      42
      511
      BURG FORCE-RES/NIGHT
      0.616201
      500
      BURGLARY
      True
    
    
      0
      2812
      DRIVING UNDER THE INFLUENCE
      7.317386
      2800
      DRIVING UNDER THE INFLUENCE
      False
    
    
      1
      1834
      CDS-POSS MARIJUANA/HASHISH
      5.708417
      1800
      CONTROLLED DANGEROUS SUBSTANCE POSSESSION
      False
    
    
      2
      2938
      POL INFORMATION
      5.096495
      2900
      MISC
      False
    
    
      3
      614
      LARCENY FROM AUTO OVER $200
      3.911164
      600
      LARCENY
      False
    
    
      4
      617
      LARCENY FROM BUILDING OVER $200
      3.829860
      600
      LARCENY
      False
    
    
      5
      2942
      MENTAL TRANSPORT
      3.598785
      2900
      MISC
      False
    
    
      6
      1412
      VANDALISM-MOTOR VEHICLE
      3.260730
      1400
      VANDALISM
      False
    
    
      7
      2941
      LOST PROPERTY
      3.119517
      2900
      MISC
      False
    
    
      8
      619
      LARCENY OTHER OVER $200
      2.790021
      600
      LARCENY
      False
    
    
      9
      634
      LARCENY FROM AUTO UNDER $50
      2.610296
      600
      LARCENY
      False
    
    
      10
      1013
      FORGERY/CNTRFT - IDENTITY THEFT
      2.550387
      1000
      FORGERY/CNTRFT
      False
    
    
      11
      613
      LARCENY SHOPLIFTING OVER $200
      2.263683
      600
      LARCENY
      False
    
    
      12
      623
      LARCENY SHOPLIFTING $50 - $199
      2.092516
      600
      LARCENY
      False
    
    
      13
      2216
      LIQUOR - DRINK IN PUB OVER 21
      2.062562
      2200
      LIQUOR - DRINK IN PUB OVER 21
      False
    
    
      14
      2413
      DISORDERLY CONDUCT
      1.985536
      2400
      DISORDERLY CONDUCT
      False
    
    
      16
      624
      LARCENY FROM AUTO $50 - $199
      1.523386
      600
      LARCENY
      False
    
    
      17
      1011
      FORGERY/CNTRFT-CRDT CARDS
      1.463477
      1000
      FORGERY/CNTRFT
      False
    
    
      18
      2943
      MISSING PERSON
      1.459198
      2900
      MISC
      False
    
    
      20
      1864
      CDS IMPLMNT-MARIJUANA/HASHISH
      1.270914
      1900
      CONTROLLED DANGEROUS SUBSTANCE IMPLMNT
      False
    
    
      23
      635
      LARCENY AUTO PART UNDER $50
      1.142539
      600
      LARCENY
      False
    
    
      24
      2111
      JUVENILE RUNAWAY
      1.104027
      2100
      JUVENILE RUNAWAY
      False
    
    
      25
      633
      LARCENY SHOPLIFTING UNDER $50
      1.104027
      600
      LARCENY
      False
    
    
      27
      2737
      TRESPASSING
      0.954256
      2700
      TRESPASSING
      False
    
    
      28
      2913
      SUDDEN DEATH NATURAL
      0.954256
      2900
      MISC
      False
    
    
      29
      627
      LARCENY FROM BUILDING $50-$199
      0.945697
      600
      LARCENY
      False
    
    
      30
      1411
      VANDALISM-DWELLING
      0.907185
      1400
      VANDALISM
      False
    
    
      31
      639
      LARCENY OTHER UNDER $50
      0.894347
      600
      LARCENY
      False
    
    
      32
      629
      LARCENY OTHER $50 - $199
      0.864393
      600
      LARCENY
      False
    
    
      33
      1014
      FORGERY/CNTRFT-ALL OTHER
      0.842997
      1000
      FORGERY/CNTRFT
      False
    
    
      35
      616
      LARCENY BICYCLE OVER $200
      0.761693
      600
      LARCENY
      False
    
    
      36
      1417
      VANDALISM-OTHER
      0.736018
      1400
      VANDALISM
      False
    
    
      37
      637
      LARCENY FROM BLDG UNDER $50
      0.706064
      600
      LARCENY
      False
    
    
      38
      1824
      CDS-SELL-MARIJUANA/HASHISH
      0.671830
      1800
      CONTROLLED DANGEROUS SUBSTANCE POSSESSION
      False
    
    
      40
      2946
      RECOVERED PROPERTY/MONT. CO.
      0.629038
      2900
      MISC
      False
    
    
      41
      1012
      FORGERY/CNTRFT-CHECKS
      0.624759
      1000
      FORGERY/CNTRFT
      False

From the classification made above over the most common crimes (80%) we could make a statement that only 8.53% of these kind of crimes are violent.



In [111]:

    
value_percentage = test_top[test_top['Violent crime'] == True]['frequency'].sum()
value_percentage = round(value_percentage,2)
print(str(value_percentage) + '% of the top43 crimes are violent')









    



8.53% of the top43 crimes are violent



In [130]:

    
valores = [100-value_percentage, value_percentage]
marcacoes = 'Não violentos', 'Violentos'

plt.pie(valores, labels=marcacoes, autopct='%1.2f%%', shadow=True)









    Out[130]:





([<matplotlib.patches.Wedge at 0x117a64198>,
  <matplotlib.patches.Wedge at 0x117a6aa58>],
 [<matplotlib.text.Text at 0x117a64f28>,
  <matplotlib.text.Text at 0x117a73828>],
 [<matplotlib.text.Text at 0x117a6a4e0>,
  <matplotlib.text.Text at 0x117a73da0>])

Wich period (morning, afternoon, night) of the day that most complaints occur

Processing data from 'Dispatch Date/Time'



In [13]:

    
datetime = pd.to_datetime(monty_data['Dispatch Date / Time'])#Takes too long to process



In [112]:

    
#Considering the top crimes

date_data = monty_data[['Dispatch Date / Time', 'Class']]
#Creation of a Master Class
date_data['Master Class'] = 0
aux = date_data['Master Class'].astype(float,copy=True)
date_data['Master Class'] = aux
date_data['Master Class'] = date_data['Class']/100
date_data['Master Class'] = date_data['Master Class'].round()
date_data['Master Class'] = date_data['Master Class']*100
aux = date_data['Master Class'].astype(int,copy=True)
date_data['Master Class'] = aux
#date_data.head(5)
date_data['Period of the day']=''
#print("-------------")
#TODO make a better version of this cell, the cast not worked, work around









    



/Users/marco/anaconda/lib/python3.6/site-packages/ipykernel/__main__.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/marco/anaconda/lib/python3.6/site-packages/ipykernel/__main__.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/marco/anaconda/lib/python3.6/site-packages/ipykernel/__main__.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/marco/anaconda/lib/python3.6/site-packages/ipykernel/__main__.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/marco/anaconda/lib/python3.6/site-packages/ipykernel/__main__.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/marco/anaconda/lib/python3.6/site-packages/ipykernel/__main__.py:12: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/marco/anaconda/lib/python3.6/site-packages/ipykernel/__main__.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

For select period of times we simply filtered the column 'Dispatch Date / Time', transformed in a proper type (datetime) with

pd.to_datetime(monty_data['Dispatch Date / Time'])

extracted only the hour out of the hole datetime structure and filtered according the period of the day. So, Morning period received all complaint starting from 5AM to 12AM. Afternoon from 1AM to 6PM and Night from 7PM to 4AM as we can se after:



In [131]:

    
morning = datetime[(datetime.dt.hour > 5) & (datetime.dt.hour <= 12)]
afternoon = datetime[(datetime.dt.hour > 12) & (datetime.dt.hour <= 18)]
night = datetime[((datetime.dt.hour > 18) & (datetime.dt.hour <= 23)) | (datetime.dt.hour > 0) & (datetime.dt.hour <= 4)]

print("In the morning we computed "+ str(morning.shape[0]) + " crimes. There is a probability of " + str(round((morning.shape[0]/23369)*100,2)) + "% of a complaint be registered in the morning") 
print("At afternoon we computed "+ str(afternoon.shape[0]) + " crimes. There is a probability of " + str(round((afternoon.shape[0]/23369)*100,2)) + "% of a complaint be registered at afternoon") 
print("In the night we computed "+ str(night.shape[0]) + " crimes. There is a probability of " + str(round((night.shape[0]/23369)*100,2)) + "% of a complaint be registered in the morning")









    



In the morning we computed 8034 crimes. There is a probability of 34.38% of a complaint be registered in the morning
At afternoon we computed 6898 crimes. There is a probability of 29.52% of a complaint be registered at afternoon
In the night we computed 7311 crimes. There is a probability of 31.29% of a complaint be registered in the morning

Wich day of the week that most complaints occur

Based on the information of the column 'Dispatch Date/Time' filtered the dates according to it's weekdays. And bellow we have that TUESDAY has the majority of occurences. That information is followed by a bar chart showing the frequency of each day.



In [135]:

    
day_of_the_week = datetime.dt.weekday_name
result = day_of_the_week.value_counts()

print('Tuesday is the day more probable to happen a crime')
result.to_frame









    



Tuesday is the day more probable to happen a crime






    Out[135]:





<bound method Series.to_frame of Tuesday      3836
Monday       3734
Wednesday    3611
Friday       3594
Thursday     3404
Saturday     2807
Sunday       2383
Name: Dispatch Date / Time, dtype: int64>



In [115]:

    
fig, ax = plt.subplots()

ind = np.arange(0, 7)
Tue = plt.bar(0, height=(result[0]/23369)*100, color='red')
Mon = plt.bar(1, height=(result[1]/23369)*100, color='green')
Wed = plt.bar(2, height=(result[2]/23369)*100, color='blue')
Fri = plt.bar(3, height=(result[3]/23369)*100, color='black')
Thu = plt.bar(4, height=(result[4]/23369)*100, color='brown')
Sat = plt.bar(5, height=(result[5]/23369)*100, color='yellow')
Sun = plt.bar(6, height=(result[6]/23369)*100, color='orange')

ax.set_xticks(ind)
ax.set_xticklabels(['Tuesday','Monday','Wednesday','Friday','Thursday','Saturday','Sunday'])

ax.set_ylim([0, 20])
plt.xticks(ind,rotation='45')
ax.set_ylabel('Percentage')
ax.set_title('Number of crimes per day of the week')

plt.show()

Analysing the distribution of violent crimes.

Looking for a way to better see how the 'Dispatch Date/Time' is distributed according the period of the day and day of the week we found a great example of a scatter chart (https://bokeh.pydata.org/en/latest/docs/user_guide/categorical.html#adding-jitter). Unfortunatelly there is a problem importing jitter to this notebook!



In [147]:

    
datetime_analys = pd.to_datetime(monty_data[(monty_data['MasterClass']==500) | (monty_data['MasterClass']==600) | (monty_data['MasterClass']==800)]['Dispatch Date / Time'])#Takes too long to process



In [148]:

    
#output_file("categorical_scatter_jitter.html")

DAYS = ['Sun', 'Sat', 'Fri', 'Thu', 'Wed', 'Tue', 'Mon']
dow = {0 : 'Sun', 1:'Mon', 2:'Tue', 3:'Wed', 4:'Thu', 5:'Fri', 6:'Sat'}
#copy of original datetime
scatter_datetime = datetime_analys
scatter_datetime = scatter_datetime.rename('complete_date')
scatter_dayofweek = datetime_analys.dt.dayofweek

scatter_dayofweek = scatter_dayofweek.rename('day_of_week')
scatter_dayofweek=scatter_dayofweek.replace(dow)

scatter_hour = datetime_analys.dt.time
scatter_hour = scatter_hour.rename('hour')

dictionary = {'datetime': scatter_datetime, 'day': scatter_dayofweek, 'time':scatter_hour}
scatter_complete = pd.DataFrame(data=dictionary)

source = ColumnDataSource(scatter_complete)

p = figure(plot_width=950, plot_height=500, y_range=DAYS, x_axis_type='datetime',
           title="Violent complaints by Time of Day (US/Central) - Montgomey County - 2013")

#p.circle(x='time', y=Jitter('day', width=0.6, range=p.y_range),  source=source, alpha=0.3)
p.circle(x='time', y='day',  source=source, alpha=0.1)
#p.circle(x='time', y=Jitter('day', width=0.6, mean=0, distribution='uniform', range=p.y_range,),  source=source, alpha=0.3)
#p.circle(x='time', y=Jitter(name='day', width=0.6, mean=0, distribution='uniform'),  source=source, alpha=0.3)
#p.circle(x='time', y='day',  source=source, alpha=0.3, size=1)
#field_name, width, mean=0, distribution='uniform', range=None
#p.circle(x='time', y=field('day', Jitter(mean=0, width=0.6, distribution='uniform', range=None)),  source=source, alpha=0.3)
p.xaxis[0].formatter.days = ['%Hh']
p.x_range.range_padding = 0
p.ygrid.grid_line_color = None



In [149]:

    
show(p,notebook_handle=True)









    






    







    














    Out[149]:




<Bokeh Notebook handle for In[275]>

As we can see above there is a concentration starting from aroud 7:00AM in the majority of the days. On Friday and Saturday the are fewer complaints registered.

Making the same kind of plot with the data of the 'Start Time' the is visible the difference between the two plots. I this second approach there is a shift in the time of the day as we can see below.



In [150]:

    
datetime_analys_start = pd.to_datetime(monty_data[(monty_data['MasterClass']==500) | (monty_data['MasterClass']==600) | (monty_data['MasterClass']==800)]['Start Date / Time'])#Takes too long to process



In [151]:

    
scatter_datetime_start = datetime_analys_start
scatter_datetime_start = scatter_datetime_start.rename('complete_date')
scatter_dayofweek_start = datetime_analys_start.dt.dayofweek

scatter_dayofweek_start = scatter_dayofweek_start.rename('day_of_week')
scatter_dayofweek_start = scatter_dayofweek_start.replace(dow)

scatter_hour_start = datetime_analys_start.dt.time
scatter_hour_start = scatter_hour_start.rename('hour')

dictionary_start = {'datetime': scatter_datetime_start, 'day': scatter_dayofweek_start, 'time':scatter_hour_start}
scatter_complete_start = pd.DataFrame(data=dictionary_start)

source_start = ColumnDataSource(scatter_complete_start)

p = figure(plot_width=950, plot_height=500, y_range=DAYS, x_axis_type='datetime',
           title="Violent complaints by Time of Day (US/Central) - Start Time - Montgomey County - 2013")

p.circle(x='time', y='day',  source=source_start, alpha=0.1)
p.xaxis[0].formatter.days = ['%Hh']
p.x_range.range_padding = 0
p.ygrid.grid_line_color = None



In [152]:

    
show(p,notebook_handle=True)









    






    







    














    Out[152]:




<Bokeh Notebook handle for In[275]>

Wich month of the years that most complaints occur

Now is time to see how are the behaviour according to the month of the Dispatch Time. First of all, we can detect there is a lack of information from the first semester of the year, chart below. So, considering only the second half of the year the month with more complaints is OCTOBER with almost 17.5%



In [153]:

    
month_of_the_year = datetime.dt.month

result_month = month_of_the_year.value_counts()

print('October is the month more likely to happen a crime with a probability of '+ str((result_month[10]/23369)*100)+'%')
number_of_crimes = sum(result_month)
print('Total number of complaints: ' + str(number_of_crimes))
result_month.to_frame

fig1, ax1 = plt.subplots()

ind = np.arange(0, 13)
Oct = plt.bar(10,height=(result_month[10]/23369)*100,color='indigo')
Aug = plt.bar(8,height=(result_month[8]/23369)*100,color='cyan')
Nov = plt.bar(11,height=(result_month[11]/23369)*100,color='blue')
Sep = plt.bar(9,height=(result_month[9]/23369)*100,color='lightblue')
Dec = plt.bar(12,height=(result_month[12]/23369)*100,color='purple')
Jul = plt.bar(7,height=(result_month[7]/23369)*100,color='magenta')

ax.set_xticks(ind)
ax.set_xticklabels(['January','February','March','April','May','June','July','August','September','October','November','December'])
ax.set_ylim([0, 20])
plt.xticks(ind,rotation='45')
#plt.xlabel.
ax.set_ylabel('Percentage')
ax.set_title('Number of crimes per month of the year')

plt.show()









    



October is the month more likely to happen a crime with a probability of 17.4376310497%
Total number of complaints: 23369

Gattering all together we have that in 2013, if you are in the month of September in the period of the morning of a Tuesday is better you be cautious because that is the combination more probable to you commit a crime or be victim of a crime #GoodLuck :)

Location of crimes

The dataset contains the latitude and longitude of the complaints. This data could be utilized to see in a map how the crimes are distributed.

Configuring maps and loading data about where the complaints have occured. Observe, to sucesfully configure the Google Maps you have to create an API Key (You can generate one from this site: https://developers.google.com/maps/documentation/javascript/get-api-key) and change in the line 'plot.api_key = ""'



In [154]:

    
map_options = GMapOptions(lat=39.151040, lng=-77.193020, map_type="roadmap", zoom=11)

plot = GMapPlot(x_range=DataRange1d(), y_range=DataRange1d(), map_options=map_options)
plot.title.text = "Montgomery County"

# For GMaps to function, Google requires you obtain and enable an API key:
#
#     https://developers.google.com/maps/documentation/javascript/get-api-key
#
# Replace the value below with your personal API key:
plot.api_key = "AIzaSyBFHmpkUOfk2FtDZXHVBSUUHp6LVPmI-fs"

Load data in using read_csv function and filtering Latitude and Longitude data. After this showing a preview of the data loaded.



In [29]:

    
#Reference https://www.census.gov/data/tables/2016/demo/popest/total-cities-and-towns.html 
#https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?fpt=table
pop_data = pd.read_csv("PEP_2016_PEPANNRES.csv",sep=';')

#Adjusting data
#Removing city configuration
pop_data['GEO.display-label'] = pop_data['GEO.display-label'].str.replace("city, Maryland","")
pop_data['GEO.display-label'] = pop_data['GEO.display-label'].str.replace("town, Maryland","")
pop_data['GEO.display-label'] = pop_data['GEO.display-label'].str.replace("village, Maryland","")

pop_data=pop_data.rename_axis({'GEO.display-label': "City"}, axis="columns")
pop_data['City'] = pop_data['City'].str.upper()
pop_data['City'] = pop_data['City'].str.strip() #Merge not working 'cause of a ' '

#Now the data was merged with data from population
pop_data_montydata = monty_data.merge(pop_data, left_on=['City'], right_on=['City'], how='inner')
pop_data[['City','respop72013']].sort_values(by=['respop72013'],ascending=False)









    Out[29]:






  
    
      
      City
      respop72013
    
  
  
    
      7
      GAITHERSBURG
      65761
    
    
      15
      ROCKVILLE
      63736
    
    
      17
      TAKOMA PARK
      17503
    
    
      14
      POOLESVILLE
      5068
    
    
      2
      CHEVY CHASE
      2930
    
    
      10
      KENSINGTON
      2331
    
    
      6
      CHEVY CHASE VILLAGE
      2026
    
    
      16
      SOMERSET
      1249
    
    
      8
      GARRETT PARK
      1026
    
    
      12
      MARTIN'S ADDITIONS
      980
    
    
      5
      CHEVY CHASE VIEW
      967
    
    
      4
      CHEVY CHASE SECTION THREE
      777
    
    
      3
      CHEVY CHASE SECTION FIVE
      698
    
    
      13
      NORTH CHEVY CHASE
      575
    
    
      18
      WASHINGTON GROVE
      551
    
    
      11
      LAYTONSVILLE
      367
    
    
      9
      GLEN ECHO
      265
    
    
      0
      BARNESVILLE
      178
    
    
      1
      BROOKEVILLE
      132

Which city/town/village/small village and etc are responsible



In [30]:

    
monty_data.columns
#monty_data[monty_data['City']].value_counts
cities_data = pd.DataFrame({'freq':monty_data['City'].value_counts(normalize=True,sort=True,ascending=False)*100,'count':monty_data['City'].value_counts(sort=True,ascending=False)})
cities_data['per_hundred_thousand'] = cities_data['count']/100000 
#cities_data.merge(pop_data, left_on=['object'], right_on=['City'], how='inner')
cities_data









    Out[30]:






  
    
      
      count
      freq
      per_hundred_thousand
    
  
  
    
      SILVER SPRING
      8626
      36.912149
      0.08626
    
    
      ROCKVILLE
      3453
      14.775985
      0.03453
    
    
      GAITHERSBURG
      3403
      14.562027
      0.03403
    
    
      GERMANTOWN
      2170
      9.285806
      0.02170
    
    
      BETHESDA
      1736
      7.428645
      0.01736
    
    
      MONTGOMERY VILLAGE
      687
      2.939792
      0.00687
    
    
      POTOMAC
      527
      2.255124
      0.00527
    
    
      CHEVY CHASE
      498
      2.131028
      0.00498
    
    
      OLNEY
      380
      1.626086
      0.00380
    
    
      KENSINGTON
      363
      1.553340
      0.00363
    
    
      BURTONSVILLE
      304
      1.300869
      0.00304
    
    
      DERWOOD
      270
      1.155377
      0.00270
    
    
      DAMASCUS
      230
      0.984210
      0.00230
    
    
      CLARKSBURG
      173
      0.740297
      0.00173
    
    
      TAKOMA PARK
      141
      0.603363
      0.00141
    
    
      POOLESVILLE
      105
      0.449313
      0.00105
    
    
      BOYDS
      90
      0.385126
      0.00090
    
    
      BROOKEVILLE
      70
      0.299542
      0.00070
    
    
      SANDY SPRING
      43
      0.184004
      0.00043
    
    
      DICKERSON
      26
      0.111259
      0.00026
    
    
      ASHTON
      19
      0.081304
      0.00019
    
    
      CABIN JOHN
      18
      0.077025
      0.00018
    
    
      SPENCERVILLE
      9
      0.038513
      0.00009
    
    
      WASHINGTON GROVE
      6
      0.025675
      0.00006
    
    
      BRINKLOW
      5
      0.021396
      0.00005
    
    
      GLEN ECHO
      4
      0.017117
      0.00004
    
    
      BARNESVILLE
      4
      0.017117
      0.00004
    
    
      MOUNT AIRY
      3
      0.012838
      0.00003
    
    
      BEALLSVILLE
      2
      0.008558
      0.00002
    
    
      LAUREL
      2
      0.008558
      0.00002
    
    
      HYATTSVILLE
      1
      0.004279
      0.00001
    
    
      KISSIMMEE
      1
      0.004279
      0.00001

Interesting fact is that SILVER SPRING does not appear in the census dataset collected. SILVER SPRING is classified as 'Unincorporated Comunity' and we do not have this kind of classification here. All municipalities are 'Incorporated comunitie' in Brazil.

Ref:https://en.wikipedia.org/wiki/Silver_Spring,_Maryland

Ref:https://en.wikipedia.org/wiki/Unincorporated_community

Bellow there is a distribuition of the amount of crimes per municipalities



In [165]:

    
fig, axes = plt.subplots(nrows=2, ncols=1,figsize=(40,40))

cities_data['count'].plot(kind='bar',ax=axes[0])
axes[0].set_title('Count of crimes')









    Out[165]:





<matplotlib.text.Text at 0x1211408d0>



In [32]:

    
monty_data['Class'].unique
freq_data = monty_data['Class'].value_counts(normalize=True, sort=True, ascending=False)
freq_data.to_frame
chart_data = monty_data['Class'].value_counts(normalize=True,sort=True,ascending=False)#.to_dict(orient='series')
print(chart_data.head(10))









    



2812    0.073174
1834    0.057084
2938    0.050965
614     0.039112
617     0.038299
2942    0.035988
1412    0.032607
2941    0.031195
619     0.027900
634     0.026103
Name: Class, dtype: float64

Configuring Bokeh to utilize Google Maps.



In [166]:

    
map_options2 = GMapOptions(lat=39.151042, lng=-77.193023, map_type="roadmap", zoom=11)

plot2 = GMapPlot(x_range=DataRange1d(), y_range=DataRange1d(), map_options=map_options2)
plot2.title.text = "Montgomery County"

# For GMaps to function, Google requires you obtain and enable an API key:
#
#     https://developers.google.com/maps/documentation/javascript/get-api-key
#
# Replace the value below with your personal API key:
plot2.api_key = "AIzaSyBFHmpkUOfk2FtDZXHVBSUUHp6LVPmI-fs"



In [169]:

    
violent_data = monty_data[(monty_data['MasterClass']==500) | (monty_data['MasterClass']==700) | (monty_data['MasterClass']==800)]
non_violent_data = monty_data[~((monty_data['MasterClass']==500) | (monty_data['MasterClass']==700) | (monty_data['MasterClass']==800))]

source_violent = ColumnDataSource(
    data=dict(
        lat=violent_data["Latitude"],
        lon=violent_data["Longitude"],
    )
)
source_non_violent = ColumnDataSource(
    data=dict(
        lat=non_violent_data["Latitude"],
        lon=non_violent_data["Longitude"],
    )
)

###print(source.data.values)
circle_red = Circle(x="lon", y="lat", size=3, fill_color="red", fill_alpha=0.8, line_color=None)
circle_blue = Circle(x="lon", y="lat", size=3, fill_color="green", fill_alpha=0.8, line_color=None)
plot2.add_glyph(source_violent, circle_red)
plot2.add_glyph(source_non_violent, circle_blue)
#
plot2.add_tools(PanTool(), WheelZoomTool(), BoxSelectTool())



In [170]:

    
show(plot2,notebook_handle=True)









    






    







    














    Out[170]:




<Bokeh Notebook handle for In[275]>



In [155]:

    
source = ColumnDataSource(
    data=dict(
        lat=latitude_data[13:130],
        lon=longitude_data[13:130],
    )
)

print(source.data.values)
circle = Circle(x="lon", y="lat", size=1, fill_color="blue", fill_alpha=0.8, line_color=None)
plot.add_glyph(source, circle)

plot.add_tools(PanTool(), WheelZoomTool(), BoxSelectTool())









    



<built-in method values of PropertyValueColumnData object at 0x117c343b8>

Ploting the geographic data in Google Maps. Note that the 'show' function receives another parameter 'notebook_handle=True' responsible for tell Bhoke to do a inline plot



In [58]:

    
show(plot,notebook_handle=True)









    






    







    














    Out[58]:




<Bokeh Notebook handle for In[58]>

Is it possible to make a correlation between the Unemployment rate and the crimes commited?

First of all wme need a good source of information about Unemployment rate. We could find onde dataset about this subject in the



In [43]:

    
#unemployment = pd.read_csv("Unemployment_MontgomeryCounty_2005to2017.csv",sep=';')
#unemployment

	Incident ID	CR Number	Dispatch Date / Time	Class	Class Description	Police District Name	Block Address	City	State	Zip Code	...	Sector	Beat	PRA	Start Date / Time	End Date / Time	Latitude	Longitude	Police District Number	Location	Address Number
0	200939101	13047006	10/02/2013 07:52:41 PM	511	BURG FORCE-RES/NIGHT	OTHER	25700 MT RADNOR DR	DAMASCUS	MD	20872.0	...	NaN	NaN	NaN	10/02/2013 07:52:00 PM	NaN	NaN	NaN	OTHER	NaN	25700.0
1	200952042	13062965	12/31/2013 09:46:58 PM	1834	CDS-POSS MARIJUANA/HASHISH	GERMANTOWN	GUNNERS BRANCH RD	GERMANTOWN	MD	20874.0	...	M	5M1	470.0	12/31/2013 09:46:00 PM	NaN	NaN	NaN	5D	NaN	NaN
2	200926636	13031483	07/06/2013 09:06:24 AM	1412	VANDALISM-MOTOR VEHICLE	MONTGOMERY VILLAGE	OLDE TOWNE AVE	GAITHERSBURG	MD	20877.0	...	P	6P3	431.0	07/06/2013 09:06:00 AM	NaN	NaN	NaN	6D	NaN	NaN
3	200929538	13035288	07/28/2013 09:13:15 PM	2752	FUGITIVE FROM JUSTICE(OUT OF STATE)	BETHESDA	BEACH DR	CHEVY CHASE	MD	20815.0	...	D	2D1	11.0	07/28/2013 09:13:00 PM	NaN	NaN	NaN	2D	NaN	NaN
4	200930689	13036876	08/06/2013 05:16:17 PM	2812	DRIVING UNDER THE INFLUENCE	BETHESDA	BEACH DR	SILVER SPRING	MD	20815.0	...	D	2D3	178.0	08/06/2013 05:16:00 PM	NaN	NaN	NaN	2D	NaN	NaN

	Class	Class Description	frequency
0	2812	DRIVING UNDER THE INFLUENCE	7.317386
1	1834	CDS-POSS MARIJUANA/HASHISH	5.708417
2	2938	POL INFORMATION	5.096495
3	614	LARCENY FROM AUTO OVER $200	3.911164
4	617	LARCENY FROM BUILDING OVER $200	3.829860
5	2942	MENTAL TRANSPORT	3.598785
6	1412	VANDALISM-MOTOR VEHICLE	3.260730
7	2941	LOST PROPERTY	3.119517
8	619	LARCENY OTHER OVER $200	2.790021
9	634	LARCENY FROM AUTO UNDER $50	2.610296
10	1013	FORGERY/CNTRFT - IDENTITY THEFT	2.550387
11	613	LARCENY SHOPLIFTING OVER $200	2.263683
12	623	LARCENY SHOPLIFTING $50 - $199	2.092516
13	2216	LIQUOR - DRINK IN PUB OVER 21	2.062562
14	2413	DISORDERLY CONDUCT	1.985536
15	811	ASSAULT & BATTERY - CITIZEN	1.634644
16	624	LARCENY FROM AUTO $50 - $199	1.523386
17	1011	FORGERY/CNTRFT-CRDT CARDS	1.463477
18	2943	MISSING PERSON	1.459198
19	821	SIMPLE ASSAULT - CITIZEN	1.330823
20	1864	CDS IMPLMNT-MARIJUANA/HASHISH	1.270914
21	711	AUTO THEFT - PASSENGER VEHICLE	1.253798
22	813	ASSAULT & BATTERY SPOUSE/PARTNER	1.240960
23	635	LARCENY AUTO PART UNDER $50	1.142539
24	2111	JUVENILE RUNAWAY	1.104027
25	633	LARCENY SHOPLIFTING UNDER $50	1.104027
26	512	BURG FORCE-RES/DAY	1.027002
27	2737	TRESPASSING	0.954256
28	2913	SUDDEN DEATH NATURAL	0.954256
29	627	LARCENY FROM BUILDING $50-$199	0.945697
30	1411	VANDALISM-DWELLING	0.907185
31	639	LARCENY OTHER UNDER $50	0.894347
32	629	LARCENY OTHER $50 - $199	0.864393
33	1014	FORGERY/CNTRFT-ALL OTHER	0.842997
34	513	BURG FORCE-RES/TIME UNK	0.787368
35	616	LARCENY BICYCLE OVER $200	0.761693
36	1417	VANDALISM-OTHER	0.736018
37	637	LARCENY FROM BLDG UNDER $50	0.706064
38	1824	CDS-SELL-MARIJUANA/HASHISH	0.671830
39	514	BURG FORCE-COMM/NIGHT	0.637597
40	2946	RECOVERED PROPERTY/MONT. CO.	0.629038
41	1012	FORGERY/CNTRFT-CHECKS	0.624759
42	511	BURG FORCE-RES/NIGHT	0.616201

	Incident ID	CR Number	Dispatch Date / Time	Class	Class Description	Police District Name	Block Address	City	State	Zip Code	...	Beat	PRA	Start Date / Time	End Date / Time	Latitude	Longitude	Police District Number	Location	Address Number	MasterClass
17	200925988	13030561	07/01/2013 03:26:03 AM	1834	CDS-POSS MARIJUANA/HASHISH	SILVER SPRING	12200 NEW HAMPSHIRE AVE	SILVER SPRING	MD	20904.0	...	3I1	519.0	07/01/2013 03:26:00 AM	NaN	39.055327	-76.995580	3D	(39.055326908694369, -76.99557965357296)	12200.0	1800
18	200925991	13030553	07/01/2013 01:03:17 AM	2812	DRIVING UNDER THE INFLUENCE	SILVER SPRING	8700 PLYMOUTH ST	SILVER SPRING	MD	20901.0	...	3H1	126.0	07/01/2013 01:03:00 AM	NaN	38.999406	-77.007736	3D	(38.999406266117461, -77.007735832110484)	8700.0	2800
19	200925992	13030559	07/01/2013 01:49:33 AM	1833	CDS-POSS COCAINE& DERIVATIVES	GERMANTOWN	12700 MIDDLEBROOK RD	GERMANTOWN	MD	20874.0	...	5N1	595.0	07/01/2013 01:49:00 AM	NaN	39.176504	-77.263908	5D	(39.176504471375559, -77.263907914323482)	12700.0	1800

	Class	frequency
Master Class
500	2050	3.068167
600	8760	25.439685
700	711	1.253798
800	2445	4.206427
1000	4050	5.481621
1400	4240	4.903933
1800	3658	6.380247
1900	1864	1.270914
2100	2111	1.104027
2200	2216	2.062562
2400	2413	1.985536
2700	2737	0.954256
2800	2812	7.317386
2900	17623	14.857290

	Class	Class Description	frequency	Master Class	Master Class Description	Violent crime
15	811	ASSAULT & BATTERY - CITIZEN	1.634644	800	ASSAULT & BATTERY	True
19	821	SIMPLE ASSAULT - CITIZEN	1.330823	800	ASSAULT & BATTERY	True
21	711	AUTO THEFT - PASSENGER VEHICLE	1.253798	700	THEFT	True
22	813	ASSAULT & BATTERY SPOUSE/PARTNER	1.240960	800	ASSAULT & BATTERY	True
26	512	BURG FORCE-RES/DAY	1.027002	500	BURGLARY	True
34	513	BURG FORCE-RES/TIME UNK	0.787368	500	BURGLARY	True
39	514	BURG FORCE-COMM/NIGHT	0.637597	500	BURGLARY	True
42	511	BURG FORCE-RES/NIGHT	0.616201	500	BURGLARY	True
0	2812	DRIVING UNDER THE INFLUENCE	7.317386	2800	DRIVING UNDER THE INFLUENCE	False
1	1834	CDS-POSS MARIJUANA/HASHISH	5.708417	1800	CONTROLLED DANGEROUS SUBSTANCE POSSESSION	False
2	2938	POL INFORMATION	5.096495	2900	MISC	False
3	614	LARCENY FROM AUTO OVER $200	3.911164	600	LARCENY	False
4	617	LARCENY FROM BUILDING OVER $200	3.829860	600	LARCENY	False
5	2942	MENTAL TRANSPORT	3.598785	2900	MISC	False
6	1412	VANDALISM-MOTOR VEHICLE	3.260730	1400	VANDALISM	False
7	2941	LOST PROPERTY	3.119517	2900	MISC	False
8	619	LARCENY OTHER OVER $200	2.790021	600	LARCENY	False
9	634	LARCENY FROM AUTO UNDER $50	2.610296	600	LARCENY	False
10	1013	FORGERY/CNTRFT - IDENTITY THEFT	2.550387	1000	FORGERY/CNTRFT	False
11	613	LARCENY SHOPLIFTING OVER $200	2.263683	600	LARCENY	False
12	623	LARCENY SHOPLIFTING $50 - $199	2.092516	600	LARCENY	False
13	2216	LIQUOR - DRINK IN PUB OVER 21	2.062562	2200	LIQUOR - DRINK IN PUB OVER 21	False
14	2413	DISORDERLY CONDUCT	1.985536	2400	DISORDERLY CONDUCT	False
16	624	LARCENY FROM AUTO $50 - $199	1.523386	600	LARCENY	False
17	1011	FORGERY/CNTRFT-CRDT CARDS	1.463477	1000	FORGERY/CNTRFT	False
18	2943	MISSING PERSON	1.459198	2900	MISC	False
20	1864	CDS IMPLMNT-MARIJUANA/HASHISH	1.270914	1900	CONTROLLED DANGEROUS SUBSTANCE IMPLMNT	False
23	635	LARCENY AUTO PART UNDER $50	1.142539	600	LARCENY	False
24	2111	JUVENILE RUNAWAY	1.104027	2100	JUVENILE RUNAWAY	False
25	633	LARCENY SHOPLIFTING UNDER $50	1.104027	600	LARCENY	False
27	2737	TRESPASSING	0.954256	2700	TRESPASSING	False
28	2913	SUDDEN DEATH NATURAL	0.954256	2900	MISC	False
29	627	LARCENY FROM BUILDING $50-$199	0.945697	600	LARCENY	False
30	1411	VANDALISM-DWELLING	0.907185	1400	VANDALISM	False
31	639	LARCENY OTHER UNDER $50	0.894347	600	LARCENY	False
32	629	LARCENY OTHER $50 - $199	0.864393	600	LARCENY	False
33	1014	FORGERY/CNTRFT-ALL OTHER	0.842997	1000	FORGERY/CNTRFT	False
35	616	LARCENY BICYCLE OVER $200	0.761693	600	LARCENY	False
36	1417	VANDALISM-OTHER	0.736018	1400	VANDALISM	False
37	637	LARCENY FROM BLDG UNDER $50	0.706064	600	LARCENY	False
38	1824	CDS-SELL-MARIJUANA/HASHISH	0.671830	1800	CONTROLLED DANGEROUS SUBSTANCE POSSESSION	False
40	2946	RECOVERED PROPERTY/MONT. CO.	0.629038	2900	MISC	False
41	1012	FORGERY/CNTRFT-CHECKS	0.624759	1000	FORGERY/CNTRFT	False

	City	respop72013
7	GAITHERSBURG	65761
15	ROCKVILLE	63736
17	TAKOMA PARK	17503
14	POOLESVILLE	5068
2	CHEVY CHASE	2930
10	KENSINGTON	2331
6	CHEVY CHASE VILLAGE	2026
16	SOMERSET	1249
8	GARRETT PARK	1026
12	MARTIN'S ADDITIONS	980
5	CHEVY CHASE VIEW	967
4	CHEVY CHASE SECTION THREE	777
3	CHEVY CHASE SECTION FIVE	698
13	NORTH CHEVY CHASE	575
18	WASHINGTON GROVE	551
11	LAYTONSVILLE	367
9	GLEN ECHO	265
0	BARNESVILLE	178
1	BROOKEVILLE	132

	count	freq	per_hundred_thousand
SILVER SPRING	8626	36.912149	0.08626
ROCKVILLE	3453	14.775985	0.03453
GAITHERSBURG	3403	14.562027	0.03403
GERMANTOWN	2170	9.285806	0.02170
BETHESDA	1736	7.428645	0.01736
MONTGOMERY VILLAGE	687	2.939792	0.00687
POTOMAC	527	2.255124	0.00527
CHEVY CHASE	498	2.131028	0.00498
OLNEY	380	1.626086	0.00380
KENSINGTON	363	1.553340	0.00363
BURTONSVILLE	304	1.300869	0.00304
DERWOOD	270	1.155377	0.00270
DAMASCUS	230	0.984210	0.00230
CLARKSBURG	173	0.740297	0.00173
TAKOMA PARK	141	0.603363	0.00141
POOLESVILLE	105	0.449313	0.00105
BOYDS	90	0.385126	0.00090
BROOKEVILLE	70	0.299542	0.00070
SANDY SPRING	43	0.184004	0.00043
DICKERSON	26	0.111259	0.00026
ASHTON	19	0.081304	0.00019
CABIN JOHN	18	0.077025	0.00018
SPENCERVILLE	9	0.038513	0.00009
WASHINGTON GROVE	6	0.025675	0.00006
BRINKLOW	5	0.021396	0.00005
GLEN ECHO	4	0.017117	0.00004
BARNESVILLE	4	0.017117	0.00004
MOUNT AIRY	3	0.012838	0.00003
BEALLSVILLE	2	0.008558	0.00002
LAUREL	2	0.008558	0.00002
HYATTSVILLE	1	0.004279	0.00001
KISSIMMEE	1	0.004279	0.00001